-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added store_many_vectors on Mongo Storage #87
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some tests for store_many_vectors
to the mongo test suite?
# Push JSON representation of dict to end of bucket list | ||
self.mongo_object.insert_one(val_dict) | ||
|
||
def _get_vector(self, hash_name, bucket_key, v, data): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring belongs to store_vector
method
def store_many_vectors(self, hash_name, bucket_keys, vs, data): | ||
requests = [] | ||
|
||
for v, d, bk in zip(vs, data, bucket_keys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using from future.builtins import zip
because it is more efficient in python2.7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code, thanks
def store_many_vectors(self, hash_name, bucket_keys, vs, data): | ||
requests = [] | ||
|
||
for v, d, bk in zip(vs, data, bucket_keys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code, thanks
@@ -147,7 +166,7 @@ def get_bucket(self, hash_name, bucket_key): | |||
shape=(val_dict['dim'], 1)) | |||
|
|||
else: | |||
vector = numpy.fromstring(val_dict['vector'], | |||
vector = numpy.frombuffer(val_dict['vector'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this because I got some deprecation warnings
nearpy/storage/storage_mongo.py
Outdated
{'lsh': {'$regex': self._format_hash_prefix(hash_name)}}) | ||
|
||
def clean_all_buckets(self): | ||
""" | ||
Removes all buckets from all hashes and their content. | ||
""" | ||
self.mongo_object.remove( | ||
self.mongo_object.delete_many( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
method is deprecated so I replaced it with the suggested delete_many
. It avoids annoying deprecation warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amorgun I added tests for the store_many_vectors
method
@etudor The test is broken because it cannot import |
@amorgun I have updated this |
@etudor It looks like a lot of mongo tests are broken now in python2.7. Please, check if it is related to your changes. Maybe you should pin an older version of |
Because I found this method to be very useful on Redis storage, I've added it to the mongo storage as well.
I haven't done a benchmark to compare what is the speed increase vs single inserts.